AITopics

Country:

North America > United States (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Industry: Education > Educational Setting > Online (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Yuviler, Tom, Drachsler-Cohen, Dana

ExPairT-LLM: Exact Learning for LLM Code Selection by Pairwise Queries

arXiv.org Artificial IntelligenceDec-4-2025

Despite recent advances in LLMs, the task of code generation is still challenging. To cope, code selection algorithms select the best program from multiple programs generated by an LLM. However, existing algorithms can fail to identify the correct program, either because they fail to distinguish nonequivalent programs or because they rely on an LLM and assume it always correctly determines the output for every input. We present ExPairT-LLM, an exact learning algorithm for code selection that selects a program by posing two new types of queries to an LLM oracle: pairwise membership and pairwise equivalence. These queries are simpler for LLMs and enable ExPairT-LLM to identify the correct program through a tournament, which is robust to some LLM mistakes. We evaluate ExPairT-LLM on four popular code datasets. Its pass@1 (success rate) outperforms the state-of-the-art code selection algorithm on average by +13.0% and up to +27.1%. It also improves the pass@1 of LLMs performing complex reasoning by +24.0%.

large language model, machine learning, natural language, (17 more...)

2511.10855

Country: Asia > Middle East (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-2-2025, 23:58:32 GMT

7298332f04ac004a0ca44cc69ecf6f6b-AuthorFeedback.pdf

artificial intelligence, dataset, natural language, (12 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Rahul Gupta, Aditya Kanade, Shirish Shevade

Neural Attribution for Semantic Bug-Localization in Student Programs

Neural Information Processing SystemsAug-20-2025, 09:02:35 GMT

Providing feedback is an integral part of teaching. Most open online courses on programming make use of automated grading systems to support programming assignments and give real-time feedback.

baseline, buggy program, neural network, (16 more...)

Country:

North America > United States (0.04)
North America > Canada (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre:

Instructional Material (0.68)
Research Report (0.46)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.54)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Neural Information Processing SystemsAug-17-2025, 02:35:03 GMT

ba3c95c2962d3aab2f6e667932daa3c5-Supplemental.pdf

artificial intelligence, def run, machine learning, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)

Neural Information Processing SystemsJan-20-2025, 17:44:55 GMT

Reviews: Sampling for Bayesian Program Learning

I found this paper interesting and well-written, but I have some significant questions and comments about the approach. The paper argues that sampling is useful because we can find the C most frequently sampled programs and show them to a user. As shown in Figure 6, there is more likely to be a correct program in the top 3 programs than in the top 1. But if we want to show the top C programs, do we really need to perform sampling, which the paper says is complicated by the existence of many long and unlikely programs that match the training examples? Why can't we simply find the MDL program and then run the solver again with length restrictions to find other consistent programs of the same length, or slightly longer lengths?

assignment, bayesian program learning, sampling, (10 more...)

Genre: Research Report (0.57)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.57)

arXiv.org Artificial IntelligenceAug-11-2024

Top Pass: Improve Code Generation by Pass@k-Maximized Code Ranking

Lyu, Zhi-Cun, Li, Xin-Ye, Xie, Zheng, Li, Ming

Code generation has been greatly enhanced by the profound advancements in Large Language Models (LLMs) recently. Nevertheless, such LLM-based code generation approaches still struggle to generate error-free code in a few tries when faced with complex problems. To address this, the prevailing strategy is to sample a huge number of candidate programs, with the hope of any one in them could work. However, users of code generation systems usually expect to find a correct program by reviewing or testing only a small number of code candidates. Otherwise, the system would be unhelpful. In this paper, we propose Top Pass, a code ranking approach that identifies potential correct solutions from a large number of candidates. Top Pass directly optimizes the pass@k loss function, enhancing the quality at the top of the candidate list. This enables the user to find the correct solution within as few tries as possible. Experimental results on four benchmarks indicate that our Top Pass method enhances the usability of code generation models by producing better ranking results, particularly achieving a 32.9\% relative improvement in pass@1 on CodeContests when compared to the state-of-the-art ranking method.

generation system, test case, top pass, (16 more...)

2408.05715

Country:

Asia > China > Jiangsu Province > Nanjing (0.05)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Shirafuji, Atsushi, Oda, Yusuke, Suzuki, Jun, Morishita, Makoto, Watanobe, Yutaka

Refactoring Programs Using Large Language Models with Few-Shot Examples

arXiv.org Artificial IntelligenceNov-20-2023

A less complex and more straightforward program is a crucial factor that enhances its maintainability and makes writing secure and bug-free programs easier. However, due to its heavy workload and the risks of breaking the working programs, programmers are reluctant to do code refactoring, and thus, it also causes the loss of potential learning experiences. To mitigate this, we demonstrate the application of using a large language model (LLM), GPT-3.5, to suggest less complex versions of the user-written Python program, aiming to encourage users to learn how to write better programs. We propose a method to leverage the prompting with few-shot examples of the LLM by selecting the best-suited code refactoring examples for each target programming problem based on the prior evaluation of prompting with the one-shot example. The quantitative evaluation shows that 95.68% of programs can be refactored by generating 10 candidates each, resulting in a 17.35% reduction in the average cyclomatic complexity and a 25.84% decrease in the average number of lines after filtering only generated programs that are semantically correct. Furthermore, the qualitative evaluation shows outstanding capability in code formatting, while unnecessary behaviors such as deleting or translating comments are also observed.

complexity, llm, proceedings, (15 more...)

2311.1169

Country: Asia > Japan > Honshū > Tōhoku (0.04)

Genre: Research Report (0.82)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Shirafuji, Atsushi, Rahman, Md. Mostafizer, Amin, Md Faizul Ibne, Watanobe, Yutaka

Program Repair with Minimal Edits Using CodeT5

arXiv.org Artificial IntelligenceSep-26-2023

Programmers often struggle to identify and fix bugs in their programs. In recent years, many language models (LMs) have been proposed to fix erroneous programs and support error recovery. However, the LMs tend to generate solutions that differ from the original input programs. This leads to potential comprehension difficulties for users. In this paper, we propose an approach to suggest a correct program with minimal repair edits using CodeT5. We fine-tune a pre-trained CodeT5 on code pairs of wrong and correct programs and evaluate its performance with several baseline models. The experimental results show that the fine-tuned CodeT5 achieves a pass@100 of 91.95% and an average edit distance of the most similar correct program of 6.84, which indicates that at least one correct program can be suggested by generating 100 candidate programs. We demonstrate the effectiveness of LMs in suggesting program repair with minimal edits for solving introductory programming problems.

correct program, edit distance, program repair, (15 more...)

doi: 10.1109/iCAST57874.2023.10359288

2309.1476

Country:

Asia > Japan (0.05)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)

Genre:

Instructional Material (0.68)
Research Report > New Finding (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-1-2023

LEVER: Learning to Verify Language-to-Code Generation with Execution

Ni, Ansong, Iyer, Srini, Radev, Dragomir, Stoyanov, Ves, Yih, Wen-tau, Wang, Sida I., Lin, Xi Victoria

The advent of large language models trained on code (code LLMs) has led to significant progress in language-to-code generation. State-of-the-art approaches in this area combine LLM decoding with sample pruning and reranking using test cases or heuristics based on the execution results. However, it is challenging to obtain test cases for many real-world language-to-code applications, and heuristics cannot well capture the semantic features of the execution results, such as data type and value range, which often indicates the correctness of the program. In this work, we propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results. Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results. The sampled programs are reranked by combining the verification score with the LLM generation probability, and marginalizing over programs with the same execution results. On four datasets across the domains of table QA, math QA and basic Python programming, LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci-002) and achieves new state-of-the-art results on all of them.

dataset, execution result, verify language-to-code generation, (11 more...)

2302.08468

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Belarus > Minsk Region > Minsk (0.04)
(17 more...)

Genre:

Research Report (1.00)
Overview > Innovation (0.34)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)